On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
نویسندگان
چکیده
منابع مشابه
Convergence of Stochastic Iterative Dynamic Programming Algorithms
Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DP-based learning algorithms to the powerful te...
متن کاملOn the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments These algorithms including the TD algo rithm of Sutton and the Q learning algorithm of Watkins can be motivated heuristically as approximations to dynamic program ming DP In this paper we provide a rigorous proof of convergence of these DP ba...
متن کاملSoft Dynamic Programming Algorithms: Convergence Proofs
Algorithms based on dynamic programming (DP) nd optimal solutions to nite-state optimal control tasks by iterating a \backup" operator that only considers the consequences of executing the \best" action in a state. In many problem domains, the optimal solution may be \brittle" and it may be desirable to nd robust, if suboptimal, solutions that prefer states that have many \good" actions to choo...
متن کاملConvergence of Sample Path Optimal Policies for Stochastic Dynamic Programming
We consider the solution of stochastic dynamic programs using sample path estimates. Applying the theory of large deviations, we derive probability error bounds associated with the convergence of the estimated optimal policy to the true optimal policy, for finite horizon problems. These bounds decay at an exponential rate, in contrast with the usual canonical (inverse) square root rate associat...
متن کاملConvergence of Numerical Method for Multistate Stochastic Dynamic Programming
Convergence of corrections is examined for a predictorcorrector method to solve Bellman equations of multi-state stochastic optimal control in continuous time. Quadratic costs and constrained control are assumed. A heuristically linearized comparison equation makes the nonlinear, discontinuous Bellman equation amenable to linear convergence analysis. Convergence is studied using the Fourier sta...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Neural Computation
سال: 1994
ISSN: 0899-7667,1530-888X
DOI: 10.1162/neco.1994.6.6.1185